56 research outputs found
Accuracy-guaranteed bit-width optimization
Published versio
High performance Boson Sampling simulation via data-flow engines
In this work, we generalize the Balasubramanian-Bax-Franklin-Glynn (BB/FG)
permanent formula to account for row multiplicities during the permanent
evaluation and reduce the complexity of permanent evaluation in scenarios where
such multiplicities occur. This is achieved by incorporating n-ary Gray code
ordering of the addends during the evaluation. We implemented the designed
algorithm on FPGA-based data-flow engines and utilized the developed accessory
to speed up boson sampling simulations up to photons, by drawing samples
from a mode interferometer at an averaged rate of seconds per
sample utilizing FPGA chips. We also show that the performance of our BS
simulator is in line with the theoretical estimation of Clifford \& Clifford
\cite{clifford2020faster} providing a way to define a single parameter to
characterize the performance of the BS simulator in a portable way. The
developed design can be used to simulate both ideal and lossy boson sampling
experiments.Comment: 25 page
A Selection of Recent Advances in Computer Systems
This paper presents a selection of recent advances in computer systems. The roadmap for CMOS technology for the next ten years shows a theoretical limit of 0.1 m for the channel of a MOSFET transistor, reached by 2007. Mainstream processors are adapting to multimedia applications with subword parallel instructions like Intel's MMX or HP's MAX instruction set extensions. Coprocessors and embedded processors are moving towards VLIW in order to save hardware costs. The memory system of the future is going to be the next generation of Rambus/RDRAM. Finally, Custom Computing Machines based on Field Programmable Gate Arrays are one of the promising future technologies for computing -- offering very high performance for highly parallelizable and pipelinable applications
Dynamic Circuit Generation for Boolean Satisfiability in an Object-Oriented Design Environment
We apply our object-oriented design environment PAM-Blox to dynamic generation of circuits for reconfigurable computing. Our approach combines the structural hardware design environment with commercial synthesis of finite state machines (FSMs). The PAM-Blox environment features a well defined hardware object interface and the ability to control the placement of hand-optimized circuits. We integrate the advantages of an object-oriented design environment with full control over placement atevery level of abstraction, with commercial FSM synthesis and optimization. As driving application we consider reconfigurable hardware accelerators for the NP-complete Boolean satisfiability problem. These accelerators require a fast compilation of circuits consisting of instance-specific datapaths and control automatons. By providing FSM optimization and control over placement, our design environment enables the maximization of performance
Parallel, Pipelined CORDICs for Reconfigurable Computing
Reconfigurable computing has shown impressive successes with data intensive and latency tolerant applications. Pipelined and parallel implementations of CORDICs can achieve very high throughput for rotation, and various other functions such as multiplication, division, as well as hyperbolic and other higher order functions. Reconfiguration allows us to adapt the implementation of CORDICs and related architectures to the specific needs and properties of individual applications or specific sets of applications; hence creating application specific CORDIC implementations. Therefore it is becoming evident that CORDICs are very well suited to reconfigurable computing and custom computing machines
Application-Specific Number Representation
Reconfigurable devices, such as Field Programmable Gate Arrays (FPGAs), enable application-specific number representations. Well-known number formats include fixed-point, floating-point, logarithmic number system (LNS), and residue number system (RNS). Such different number representations lead to different arithmetic designs and error behaviours, thus produc-ing implementations with different performance, accuracy, and cost. To investigate the design options in number representations, the first part of this thesis presentsa platform that enables automated exploration of the number representation design space. Thesecond part of the thesis shows case studies that optimise the designs for area, latency orthroughput from the perspective of number representations. Automated design space exploration in the first part addresses the following two major issues: • Automation requires arithmetic unit generation. This thesis provides optimised arithmetic library generators for logarithmic and residue arithmetic units, which supporta wide range of bit widths and achieve significant improvement over previous designs. • Generation of arithmetic units requires specifying the bit widths for each variable. This thesis describes an automatic bit-width optimisation tool called R-Tool, which combines dynamic and static analysis methods, and supports different number systems (fixed-point, floating-point, and LNS numbers). Putting it all together, the second part explores the effects of application-specific number representation on practical benchmarks, such as radiative Monte Carlo simulation, and seismic imaging computations. Experimental results show that customising the number representations brings benefits to hardware implementations: by selecting a more appropriate number format, we can reduce the area cost by up to 73.5% and improve the throughput by 14.2% to 34.1%; by performing the bit-width optimisation, we can further reduce the area cost by 9.7% to 17.3%. On the performance side, hardware implementations with customised number formats achieve 5 to potentially over 40 times speedup over software implementations.EThOS - Electronic Theses Online ServiceOverseas Research Students Award Scheme and UK Engineering and Physical Sciences Research CouncilGBUnited Kingdo
Application of reconfigurable CORDIC architectures
Very high performance architectures can be designed for data intensive and latency tolerant applications by maximizing the parallelism and pipelining at the algorithm and bit level. This is achieved by combining such technologies as reconfigurable or adaptive computing and CORDIC style arithmetic, for computing (possibly hyperbolic) rotations, multiply, divide, and related higher order functions (e.g. square-root, multidimensional rotations). Reconfiguration allows adapting the implementation of such functions to the specific needs of individual or specific sets of applications, from multi-media to radar and sonar, hence creating application specific CORDIC-style implementations. We show a high-throughput CORDIC for reconfigurable computing, a low latency CORDIC, and discuss an application to adaptive filtering (normalized ladder algorithm). 1
- …